Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

نویسندگان

  • Ya Li
  • Jianhua Tao
  • Keikichi Hirose
  • Xiaoying Xu
  • Wei Lai
چکیده

Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated into HMM-based speech synthesis (HTS) and Fujisaki model-based speech synthesis systems to accurately model the undulation of pitch contour. In HMM-based expressive speech synthesis, stress-related contextual features obtained from the hierarchical model are introduced in modeling the prosodic variation caused by stress, in addition to the traditional prosodic features used in HTS. A rule-based and a Deep Belief Network based prosodic variation models are proposed and then used in stress adaptation module in HTS. The other approach uses the Fujisaki model to improve the expressiveness of synthetic speech. The hierarchical stress model is introduced into the phrase and tone command control mechanisms of the model. The pitch contour is then directly generated by the superposition of two-level commands of the Fujisaki model. Experimental results using the proposed hierarchical stress modeling and generation methods showed that the macroand microcharacteristics of stress could be successfully captured. The methodology proposed in this paper has application to a range of areas such as conveying attitude and indicating focus in spoken dialog systems. 2015 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical stress generation with Fujisaki model in expressive speech synthesis

This paper introduces a hierarchical stress generation for expressive speech synthesis. In the previous study, we proposed a novel hierarchical Mandarin stress modeling method, and the text-based stress prediction experiments demonstrates a reliable stress assignment can be obtained from textual features. However, the stress model should be further verified to be an effective and efficient pros...

متن کامل

Hierarchical Stress Modeling in Mandarin Text-to-Speech

Automatic stress prediction is helpful for both speech synthesis and natural speech understanding. This paper proposes a novel hierarchical Mandarin stress modeling method. The top level emphasizes stressed syllables, while the bottom level focuses on unstressed syllables for the first time due to its importance in both naturalness and expressiveness of synthetic speech. Maximum Entropy model i...

متن کامل

Modeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis

This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is opt...

متن کامل

Intonation modeling of Mandarin Chinese using a superpositional approach

The intonation model is an important component in text-tospeech systems to obtain natural and expressive speech synthesis. In this paper we propose a superpositional model for Mandarin Chinese. The intonation model is composed of the syllable and the phrase component. The parameters of the model are estimated using JEMA, a training approach with many advantages related to robustness and precisi...

متن کامل

Prosodic modeling in large vocabulary Mandarin speech recognition

The issue of incorporating prosodic information into speech recognition processes has emerged in recent years. In this work we present a complete framework for Mandarin speech recognition with prosodic modeling considering two-level hierarchical prosodic information for Mandarin Chinese. We developed a GMM-based, a decision-tree-based, and a hybrid approach. The best improvements in character r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 72  شماره 

صفحات  -

تاریخ انتشار 2015